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LEAPFROG  VARIANTS  OF  ITERATIVE  METHODS 
FOR  LINEAR  ALGEBRAIC  EQUATIONS 

Paul  E.  Saylor 


ABSTRACT 

Two  iterative  methods  are  considered,  Richardson’s  method  and  a  gen¬ 
eral  second  order  method.  For  both  methods,  a  variant  of  the  method  is 
derived  for  which  only  even  numbered  iterates  are  computed.  The  variant 
is  called  a  leapfrog  method.  Comparisons  between  the  conventional  form 
of  the  methods  and  the  leapfrog  form  are  made  under  the  assumption  that 
the  number  of  unknowns  is  large.  In  the  case  of  Richardson’s  method,  it  is 
possible  to  express  the  final  iterate  in  terms  of  only  the  initial  approxima¬ 
tion,  a  variant  of  the  iteration  called  the  grand-leap  method.  In  the  case  of 
the  grand-leap  variant,  a  set  of  parameters  is  required.  An  algorithm  is  pre¬ 
sented  to  compute  these  parameters  that  is  related  to  algorithms  to  compute 
the  weights  and  abscissas  for  Gaussian  quadrature.  General  algorithms  to 
implement  the  leapfrog  and  grand-leap  methods  are  presented.  Algorithms 
for  the  important  special  case  of  the  Chebyshev  method  are  also  given. 
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for  Computer  Applications  in  Science  and  Engineering  (ICASE),  NASA  Langley  Research 
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1.  Introduction. 

The  subject  of  this  paper  is  a  set  of  techniques  to  improve  efficiency  in  the 
iterative  solution  of  a  real  or  complex  linear  system  Ax=b,  especially  for  the 
solution  of  large  problems  on  supercomputers. 

An  iterative  method  generates  a  sequence  x^'^,  ...  .  For  the 

methods  of  this  paper,  a  variant  such  that  x^’^  can  be  expressed  directly  in  terms 
of  with  no  dependence  on  will  be  called  a  leapfrog  method.  A  variant 

of  Richardson’s  method  is  also  presented  for  which  the  final  iterate  is  computed 
from  the  intial  approximation  with  no  computation  of  intermediate  iterates.  This 
will  be  called  the  grand-leap  method.  The  advantages  of  the  leapfrog  and 
grand-leap  methods  are:  (i)  a  slight  reduction  in  some  cases  in  the  total  number 
of  arithmetic  operations;  (ii)  an  increase  in  the  number  of  terms  in  vector  sums, 
an  advantage  on  supercomputers  that  “chain”,  i.e.,  transmit  results  from  one 
arithmetic  unit  directly  to  another;  and  (iii)  a  reduction  in  I/O  operations  for 
large  problems.  In  his  Ph.  D.  thesis  [Chro86j  studied  methods  to  omit 
intermediate  successive  iterates  for  the  conjugate  gradient  method  as  a  way  to 
allow  parallel  computation  of  matrix  vector  products.  His  goals  overlapped 
somewhat  with  those  of  this  paper  but  the  approach  is  not  the  same. 

Two  iterative  methods  are  considered:  Richardson’s  method  [FoWa60, 
HaYoSlj  and  a  general  second  order  iterative  rpethod.  For  Ricb"rdpon'.s  m'^thod 
the  leapfrog  method  was  used  in  [SmolSl,  SmSaSSj  as  a  technique  to  avoid 
complex  arithmetic.  In  this  paper  its  other  properties  are  explored. 
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Richardson’s  method  is  an  old  method  the  advantages  of  which  have 
generally  been  ignored;  however,  see  [AnGo72j.  In  the  symmetric  positive  definite 
case,  Richardson  iteration  parameters  do  not  yield  an  optimum  iterate  at  each 
step  whereas  a  second  order  method  does.  This  is  one  reason  for  the  neglect  of 
Richardson’s  method.  However,  in  a  paper  of  Tal-Ezer  [Tal87],  a  novel  approach 
is  described  in  which  Richardson  iterates  are  almost  optimum  at  each  step. 


The  Chebyshev  iteration  is  an  example  of  a  second  order  method  |Mnt77, 
Mnt78],  used  for  the  solution  of  nonsymmetric  systems.  The  Chebyshev  iteration 
is  not  applicable,  however,  unless  the  eigenvalues  of  A  lie  in  a  half  plane. 
Furthermore,  the  Manteuffel  adaptive  algorithm  [Mnt78]  assumes  the  eigenvalues 
appear  in  complex  conjugate  pairs,  which  holds  if  the  matrix  is  real.  This  is  a 
brief  argument  for  the  use  of  Richardson’s  method  if  the  matrix  is  either  a 
general  real  nonsymmetric  matrix  with  eigenvalues  in  both  half  planes  or  is  a 
complex  matrix  the  eigenvalues  of  which  do  not  appear  in  complex  conjugate 
pairs.  It  should  be  stressed  that  large  complex  matrices  arise  in  signal  processing, 
and  constitute  an  important  class  of  problems. 


1.1.  Summary.  In  §2,  the  leapfrog  version  of  Richardson’s  method  is  derived. 
Iteration  parameters  are  assumed  given,  with  the  exception  of  the  Chebyshev 
case  for  which  explicit  formulas  are  given  as  well  as  an  algorithm.  With  p’-operly 
chosen  pcii.j.nieters,  the  method  applies  to  any  real  or  complex  matrix. 
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In  §3,  the  grand-leap  method  is  presented  for  computing  the  final 
Richardson's  method  iterate  in  terms  of  the  initial  iterate.  An  algorithm  is  also 


given. 


Comparisons  among  the  conventional,  leapfrog  and  grand-leap  versions  of 
Richardson’s  method  are  made  in  §4. 

In  §5,  the  general  formula  for  a  second  order  method  is  stated,  and  a  leapfrog 
version  derived.  An  algorithm  (Algorithm  3)  is  stated  in  which  the  parameters  are 
assumed  given.  Algorithms  for  these  parameters  are  presented  in  §7. 

Optimum  L ^-iteration  parameters  are  defined  in  the  Chebysbev  case  in  §2. 
In  §6,  L2-optimum  parameters  are  defined.  Optimum  L2-Richardson’s 
parameters,  in  the  case  of  real  eigenvalues,  are  the  roots  of  an  orthogonal 
polynomial.  An  algcrrithm  to  compute  roots  of  orthogonal  polynomials  is 
developed,  which  is  an  implementation  of  the  Stieltjes  algorithm  [Gaut82  |  and 
related  to  an  algorithm  presented  in  [GoWe69]  for  the  weights  and  nodes  for 
Gaussian  quadrature.  This  algorithm  is  modified  to  yield  the  quantities  required 
to  execute  the  grand-leap  method.  The  L2-approach  is  only  one  approach  to 
optimum  parameters.  A  non-L^  treatment  is  given  in  [ElSt85,  Tal87].  Each  of 
these  references  is  more  general  than  the  L2-niethods  in  §6.  A  completely  general 
L2-approach  may  be  based  on  [SaSm88]  but  is  beyond  the  scope  of  this  paper. 

Algorithms  based  on  the  methods  in  §6  are  gathered  and  presented  in  o7. 
Algorithms  for  the  special  and  important  Chebyshev  case  are  also  given  in  §7. 
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1.2.  Conventions  and  Notation.  Although  an  U-inner  product  is  a  special 
case  of  the  Z/2-inner  product  if  the  measure  is  chosen  correctly,  for  clarity  and 
convenience,  the  two  are  used  separately. 

The  solution  of  a  linear  set  of  equations  Ax  =  b  generally  requires  that  the 
set  be  preconditioned  by  transforming  it  into  a  set  such  as,  for  example, 
CAx  =  Cb  for  which  the  iterative  method  converges  more  rapidly.  (Other 
preconditionings  yield  systems  such  as  CAQQ“^x  =  Cb.  but  the  same  remarks 
hold  for  these  other  cases.)  There  is  no  change  in  the  techniques  or  algorithms 
presented  in  this  paper  if  they  are  applied  to  CAx  =  Cb  rather  than  Ax  =  b 
other  than  the  change  in  the  matrix  from  A  to  CA.  It  will  therefore  be  assumed 
that  A  is  the  preconditioned  matrix. 

It  will  be  convenient  from  time  to  time  to  state  that  an  algorithm  converges 
with  no  restriction  on  the  input  data  and  if  certain  conditions  on  the  eigenvalues 
are  met.  In  a  practical  sense,  of  course,  there  are  restrictions  on  the  data  such  as 
those  needed  to  prevent  overflow,  which  may  be  infeasible  to  analyze  and 
formulate. 

Matrices  and  vectors  are  denoted  by  boldface  type. 

The  number  of  unknowns  is  denoted  by  N. 


2.  Richardson  s  Method:  Conventional  Form  and  Leapfrog  Form. 

This  section  begins  with  a  stat'^.ment  of  Richardson’s  method,  from  which  the 
leapfrog  form  is  easily  derived.  The  Chebyshev  case  is  outlined  and  an  algorithm 


given. 


2.1.  Conventional  Richardson’s  Method.  Let  Tq  ,  ...,  r;^_j  be  a  cycle  of 
iteration  parameters  where  k  is  called  the  period.  The  purpose  of  iteration 
parameters  is  to  reduce  the  error,  but  discussion  of  this  is  postponed  until  later. 
For  now,  attention  is  directed  to  the  iteration,  and  the  reader  is  asked  to  accept 


the  parameters  as  given. 

Let  be  an  initial  guess.  Richardson’s  method  is  defined  as  follows.  For 

t  =1,  ...,  k , 


(2.1.1) 


2.2.  Leapfrog  Form.  The  recursion 


(2.2.1) 


will  be  used,  which  may  be  derived  by  first  subtracting  (2.1.1)  from  x=x  to 


where  ^=x— and  then  multiplying  (2.2.2)  by  A.  Since  (2.2.1) 

follows.  Vectors  and  are  called  the  (true)  error  and  the  residual  error 
respectively. 

The  leapfrog  step  from  to  x^*)  results  from  using  (2.2.1)  in  (2.1.1)  to 

give  [SmSaSS],  for  i  =  2,  4,  ...,  k  (under  the  assumption  that  k  is  even) 

+T,_j  I-7-,_2a] 

=x(‘  +(r,  _i  -t-r,  _2)r(’  -  r,  _ i Ar''  . 


2.3.  Optimum  Chebyshev  Parameters.  It  is  easy  to  show  that  the  error 


and  residual  error  satisfy 


e('=)=ii:,(A)e(°)  , 

r(*)=i?fc(A)r(°) 


(2.3.1) 


where  Rj^  is  the  polynomial 


i?,(c)=  n(l-r,_,f). 

t  *1 


Any  polynomial  such  that  /2^(0)  —  1  is  called  a  r'esidual  polynomial  [Si\eoS\. 
Parameters  are  chosen  to  minimize  /?fc(A)  in  some  sense,  to  be  discussed  next. 


Let  /?  be  a  set  containing  the  spectrum  of  A.  Usually  one  thinks  of  Q  as  an 


interval  or  union  of  intervals  on  the  real  line.  However  if  A  is  nonsymmetric. 
then  both  Q  and  the  spectrum  may  lie  off  the  real  axis  in  the  complex  plane.  Two 
commonly  used  methods  to  minimize  R/dA.)  are  either  to  minimize  the  L^,-norm 
of  polynomial  over  fl  or  to  minimize  a  weighted  L2-norm  over  i7.  In  this 

part,  only  the  L^-norm  will  be  discussed.  How  Chebyshev  polynomials  are  used 
to  minimise  this  norm  will  now  be  outlined.  The  papers  of  Manteuffel  give  more 
details  [Mnt77,  Mnt78j. 

The  Chebyshev  residual  polynomial  is  defined  by 


where  is  the  Chebyshev  polynomial  of  degree  k,  and  d  and  c  are  parameters 
defining  a  confocal  family  of  ellipses:  d  is  the  center  and  the  foci  are  d  ±  c . 
Henceforth,  in  the  Chebyshev  case,  the  set  J?  containing  the  nonzero  spectrum 
will  be  an  ellipse.  (An  ellipse,  as  the  term  is  used  here,  means  the  union  of  the 
curve  and  its  interior.)  Parameter  c  is  assumed  either  real  or  purely  imaginary; 
in  either  case,  c^  is  real.  (If  c  =  0,  Rk(<:)  reduces  to  (<;  —  d)^  /d^ .)  .Assume  that  d 
is  real  and  one  member  of  the  confocal  family  with  center  d  and  foci  d  ±  c  is  in 
the  interior  of  the  right  half  plane,  i.  e.,  d  >  0.  (If  there  is  one  in  the  left  half 
plane,  then  we  consider  —A  instead  of  A.)  These  assumptions  mean  respectively 
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that  the  major  axis  of  each  ellipse  of  the  family  is  either  on  the  real  line  or  that 
the  major  axis  of  each  ellipse  is  perpendicular  to  the  real  line.  If  the  major  axis  is 
on  the  real  line,  then  the  assumption  that  at  least  one  member  of  the  family  lie  in 
the  interior  of  the  right  half  plane  means  that  d  —  \  c\  >0.  Finally,  note  that  if 
the  eigenvalues  are  real,  then  ellipse  /?  may  be  assumed  to  be  the  interval 
—  I  c  i  ,  d  -f-  1  c  I  j,  which  is  the  degenerate  ellipse  of  the  confocal  family. 

Among  all  residual  polynomials,  it  may  be  shown  Mnt78  that  the 
Chebyshev  residual  polynomial  has  the  minimum  L^-norm  over  the  interval 
n—-\d  —  \  c\  ,d+|  c|  ],  and  closely  approximates  the  residual  polynomial  with 
minimum  L^-norm  over  any  ellipse,  /?,  with  center  d  and  foci  d  ±  c. 

It  is  not  necessary  that  d  and  c"  be  real  in  order  that  the  Chebyshev 
iteration  converge.  The  reason  for  assuming  above  that  these  are  real  quantities 
is  connected  with  the  Manteuffel  algorithm  [Mnt78i.  The  Manleuffel  algorithm  is 
valid  only  when  the  eigenvalues  of  A  appear  in  complex  conjugate  pairs,  and  it  is 
this  that  leads  to  the  assumption  that  d  and  c'  are  real.  In  general  for  any  d  and 
c  ,  convergence  results  if  there  is  one  ellipse  with  center  d  and  foci  d  ±  c 
containing  the  eigenvalues  that  does  not  also  contain  the  origin.  .As  a  practical 
matter,  however,  the  Chebyshev  iteration  is  useful  only  when  d  and  c  can  be 
obtained  by  some  technique  such  as  the  Manteuffel  algorithm. 

In  the  Chebyshev  case  the  Richardson  iteration  parameters  are  derived  from 
the  roots,  {p,  j-,  of  T^.  which  are 


Pi  =cos 


TT 


2k 


,  i  =0,  fc-1. 


The  roots  of  are  therefore  d  +  cpj,  for  i  =0,  A:— 1  and  the  parameters  for 

Richardson’s  method  are 


1 

'  cp,  +  d 

2.4.  An  Algorithm  in  the  Chebyshev  Case.  First  a  technical  note  on 
avoiding  complex  arithmetic:  If  the  major  axis  is  vertical,  i.e..  if  c  is  pure 
imaginary,  then  the  roots  of  occur  in  conjugate  pairs.  It  is  an  advantage  to 
order  the  parameters  jr,}  in  such  a  way  that,  in  this  case,  r,  and  r,  _2  are 
conjugate  pairs  (in  order  that  r,_j  +  r^_2  and  r, _ir;_2,  required  by  the  algorithm, 
are  real),  a  task  equivalent  to  ordering  the  roots,  {p,  },  of  T^.  in  such  a  way  that 
Pt  -2  ~  “Pt  -1- 

Let  h  =  nj2k.  The  roots  of  Tj^  are  the  cosine's  of 
9^  =  h ,  $2  ~  n  —  h ,  $2  =  3h ,  6^  =  n  ~  3h ,  ...  .  U  k  is  even,  which  it  is  in  the 

leapfrog  case,  the  last  two  roots  in  this  ordering  are  the  cosine's  of  h 

and  9^.  =  ~  +  h.  Moreover,  cos^i  =  —  cos^ji  In  the  algorithm,  the  formula 


I 


2 

1  + 1 

2 

1  1 

. 
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—  1  [{  — 1)‘  ’/i  +  n-modfi +1.  ‘2) 


I* 


for  z  =  1,  k,  will  be  used.  If  p,  =costf--,  then  in  the  cj.se  when  c  is  pure 
imaginary,  {d  +  c  p^,  d  +  CP2},  {d  +  c  p^,  d  -f-  CP4},  ...  is  a  sequence  of  conjugate 


pairs. 


If  the  major  axis  is  real,  the  error  is  reduced  after  a  cycle  of  exactly  k 
parameters,  but  the  algorithm  may  not  have  converged.  If  the  major  axis  is 
vertical,  the  error  is  reduced  only  if  k  is  sufficiently  large,  a  requirement  that  in 
practical  applications,  however,  is  observed  to  be  reasonable.  If  the  algorithm  has 
not  converged,  the  cycle  of  parameters  is  repeated.  After  the  parameters  have 
been  recycled  /  times,  the  error,  =x  —  satisfies  =  |i2j.(A)| 


But  where  Rj^i  is  the  optimum  Chebyshev  residual  polynomial 

of  degree  kl.  This  is  a  basic  problem  with  Richardson’s  method;  It  is  optimum 
only  at  the  end  of  one  cycle  of  parameters.  For  an  alternative  approach,  not 
based  on  Chebyshev  parameters,  see  [Tal87]  in  which  a  method  is  proposed  to 
increase  the  number  of  parameters  in  an  almost  optimum  way  (thus  the  period  is 
not  fixed)  until  convergence  is  achieved. 

Algorithm  1.  (Leapfrog  Richardson’s  method  in  the  Chebyshev  case.) 

Purpose.  Execute  Richardson’s  method  with  Chebyshev  parameters  and  omit 
alternate  steps. 

Input.  Matrix  A,  right  side  b,  initial  guess  cycle  k,  and  ellipse 

parameters  d  and  c.  The  ellipse  parameters  are  assumed  known,  for  example,  as 
output  from  the  Manteuffel  algorithm  [Mnt78,  .Ashb85].  The  user  must  also 
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provide  a  maximum  number  of  cycles  of  iterations  and  an  error  criterion  to  halt 
the  iteration. 

Output.  Iterate  the  iterate  reached  after  the  last  cycle  of  k  parameters 
in  the  standard  execution  of  Richardson’s  method. 

Restrictions.  If  d  and  are  real,  then  d  is  assumed  either  positive  or 
negative  and  ''vill  be  assumed  positive  without  loss  of  generality. 
Also,  for  real  c,  0  <  —  |  c )  .  In  general  for  any  d  and  c^,  convergence  results  if 

there  is  one  ellipse  with  center  d  and  foci  d  ±  c  containing  the  eigenvalues  that 
does  not  also  contain  the  origin.  If  the  matrix  is  singular,  the  algorithm 
converges  to  a  solution  if  the  system  is  consistent.  Period  k  is  even. 

Notes,  (l)  Quantities  0,  ,  p,-,  and  r,  need  not  be  array  variables  since  only 
three  values  are  used  during  execution,  but  subscripts  make  the  algorithm  more 
convenient  to  state.  (2)  A  slight  modification  of  the  algorithm  would  allow  {r, }  to 
be  an  input  array,  for  example,  from  Algorithm  4. 


1)  Set  h  :=  . 

^  2k 


2)  Do  either  until  convergence  or  a  limit  on  the  number  of  loops  is  exceeded. 
2.1)  For  i  =  2  to  /c  by  2  do: 


2.1.1)  Set  9,_i  :=  12 


-  1 


(— 1)‘  +  7rmod(i ,  2). 


2.1.2)  Set  e,  ;=  2 


i  +  1 


—  1  H— 1)'  'h  +  7rmod(t  4-  1,  2). 


m 


2.1.3)  Set  p,_j  :=  cos^,  _j  . 

2.1.4)  Set  /),  ;=  cos^,  . 

2.1.5)  Setr._2  - . 

d  + 

2.1.6)  Set  r._i  :=  — ^ - . 

d  +  c  Pi 


2.1.7)  Set  a  :=  t-,  _2  +  r,  _i  . 

2.1.8)  Set  u  :=  . 


2.1.9)  Set  r(‘-2)  :=  b  -  Axt*-^) 


2.1.10)  Set  t  :=  . 

2.1.11)  Set  x(‘>  :=  ^  ^^(t- 


2.1.12)  Endfor. 


2.2)  If  not  converged,  set  x^°^  :=  x^*^^  . 


2.3)  Enddo. 
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3.  Richardson’s  Method  in  One  Step. 

It  is  easy  to  see  that  leapfrog  could  be  continued  further  to  allow  computing 
from  Ultimately,  one  arrives  at  an  expression  for  x^*^  in  terms  of  x^°^  with 

no  intermediate  approximations,  the  form  of  which  is,  as  will  be  seen 
momentarily. 


x(<=)=x^°^+Ci_i(A)r^°)  ,  (3.1) 

where  C;^_i  is  a  polynomial  of  degree  A:— 1.  Computing  x^'^^  only  from  x^°^  while 
omitting  the  computation  of  any  intermediate  approximation  will  be  called  the 
grand-leap. 


3.1.  Krylov  Subspace  Methods.  Richardson’s  method  is  an  example  of  a 
Krylov  subspace  method,  i.  e.,  for  1  <  t  , 

x(‘)  _  x{°)  e  v;  ,  (3.1.1) 

where  Vj  is  the  Krylov  subspace  defined  by 

U,-  =span|r^°^,  .  .  ., 

The  proof  of  (3.1.1)  is  an  easy  induction  based  on  x^*^  =  x^‘ -f- r, _jr^‘ 
Membership  relation  (3.1.1)  implies  (3.1). 


-V-, 


I 


$ 

I 


t 

I 


I 

:S 


3.2.  An  Expression  for  C;._i.  Multiply 


c(^)  _x(0)  =x(*)  -X  +x  -x^°)  =Ct_i(A)r(°) 


by  A  to  obtain 


[.(0)  _!.(*=)  =AC'^_j(A)r^ 


Since 


r('=)=i2^(A)r(°)  , 


it  follows  that 


Cc-i(f)- 


which  is  a  polynomial  since  /2^(0)  =  1. 


Therefore  if 


+  ■  ■  ■  +^lC+l 


'+  •  •  •  4-01- 


(3.2.1J 


The  representation  of  any  polynomial  in  f  in  terms  of  powers  of  c  is  called  the 
power  form. 


WWW5 


I'S' 

Ivl 

TO* 
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3.3.  A  Remark  on  Polynomial  Preconditioning.  Assume  that  residual 
polynomial  is  small  on  set  fi  (containing  the  spectrum  of  A.)  Therefore,  on  J7, 
is  an  approximation  to  and  Cjt_i{A)  is  an  approximation  to  A'\ 
Polynomial  arises  in  so-called  polynomial  preconditioning  iAdm82,  AMS87, 

Chen82,  JMP83,  Sayl83,  Tal87j. 


3.4.  Methods  to  Compute  important  Chebyshev  case, 


Rk{&- 


The  coefRcients,  6^ ,  could  be  easily  determined  by  expanding 


d  -c 


in  terms 


of  powers  of  $•.  In  principle,  the  coefBcients  of  any  residual  polynomial  could  be 
determined  in  the  same  way,  although  no  residual  polynomial  is  as  well 
documented  as  the  Chebyshev  case. 


Despite  the  simplicity  of  this  approach,  it  has  the  unfavorable  feature  that 
even  when  the  coefficients,  ,  are  known  explicitly,  it  is  numerically  difficult  to 
compute  the  vector  d=^^A*'~^r^°^+  •  •  •  due  to  the  ill  conditioning  of  the 

basis  {r^°\  •  •  •  ,A^~^r^°^},  if  k  is  large.  However,  to  avoid  instability  it  is  often 
sufficient  to  take  a  small  value  of  k,  say  k  =5.  If  the  power  form  coefficients  are 
known  then  nested'  polynomial  evaluation  (Horner’s  rule)  could  be  used  to 
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^  "h  % 


compute  C7^._i(A)rf°^. 

An  algorithm  is  presented  later,  Algorithm  2,  in  which  the  roots, 
{<T,  :  i  =1,  ...,  k  —  l},  and  the  leading  coefficient,  Pjt-i>  ^k~i  assumed 

given.  The  roots  of  C'j._i  may  be  computed  from  the  power  form  (for  example  by 
computing  the  eigenvalues  of  the  companion  matrix.)  A  method  and  algorithm 
(Algorithm  5)  are  presented  in  §6  that  do  not  require  the  power  form  coefficients. 
Also  see  [Tal87j  for  a  non-L2  approach. 


3.4.1.  Avoiding  Complex  Arithmetic.  If  A  is  real,  then  it  is  reasonable  to 
assume  that  the  coefficients  of  C)._i  are  real.  If  so,  then  the  roots  of  Cj^_i  occur 
in  complex  conjugate  pairs.  Let  o  and  <7  be  a  conjugate  pair.  Since 


(A  — cr)(A  — =  |a^  —  (cr+a^A4- 1  a|  ^  j  u  , 

no  complex  arithmetic  is  required  to  evaluate  Cjt_i(A)r^°^  when  the  factored  form 
is  used.  (In  the  general  case  when  A  is  complex,  the  roots  do  not  occur  in 
conjugate  pairs.) 


3.5.  Algorithm  for  the  Grand-Leap. 

Algorithm  2.  (Compute  x*^^=x^°^+Cjt_i(A)r^°^^ 


s 


Purpose.  Compute  the  final  iterate  from  the  initial  guess  with  no 


intermediate  iterates  computed,  except  those  at  the  end  of  each  cycle. 

Input.  Matrix  A,  right  side  b,  the  intitial  guess  x^°^,  period  k,  the  leading 

fc-i 

coefficient,  ^*.-1  tti®  roots  ct^,  ...,  aj._i  of  i-e.,  IT  —  o ^). 

Parameter  Tq,  which  is  the  reciprocal  of  the  root  of  R^,  is  required  if  A:  =  1.  These 
parameters  are  generated  from  Algorithm  5,.  and,  in  the  Chebyshev  case,  from 
Algorithm  6.  However,  there  are  other  sources  for  the  parameters  such  as  Tal87  . 
The  user  must  also  provide  a  maximum  number  of  cycles  of  iterations,  and  an 
error  criterion  to  halt  the  iteration. 

Output.  Iterate  the  last  iterate  reached  after  a  cycle  of  k  parameters  in 
the  standard  execution  of  Richardson’s  method. 

Restrictions.  The  algorithm  executes  with  no  restrictions  on  the  input  data. 
However  in  order  for  the  algorithm  to  converge  to  the  solution  of  Ax  =  b.  for  all 
b,  it  is  necessary  that  |  Ri^{\^)\  =|  1  —  Xj )|  <1  for  each  nonzero 

eigenvalue,  X, ,  of  A.  If  this  holds  and  A  is  singular,  the  algorithm  converges  to  a 
solution  when  the  system  is  consistent. 


1)  Set  :=b  —  Ax^°). 


2)  If  /c  =  1  then 


2.1)  Set  x(^>:  =  x('’’+r,r(°l 


2.2)  Return. 


3)  Separate  the  roots  {c,  }  of  Cj._i  into  real  roots  and  conjugate  pairs  of  (nonreal) 
roots:  cTj,  ...,  are  real;  ^m+i^  -"i  ^k~\  nonreal  and 

j+\  ^  + 1  <  j  < A:  —2. 


4)  Do  until  convergence  or  a  limit  on  the  number  of  loops  is  exceeded: 


4.1)  Set  :=b  —  . 


4.2)  Set 

k—i  r  f  1  m 

n  A  A  -  (<7y +cry+i)  +(7y(7j  +  i  xn(A-o-j)i 

j=m+l  L  \  J  ;  =  1 

4.3)  If  not  converged  set  :=x^'‘] 


Enddo. 


4.  Comparisons. 


To  display  some  of  the  advantages  in  the  leapfrog  and  grand-leap  approach  a 
side-by-side  comparison  of  algorithms  is  made  in  Table  1.  The  conventional 
Richardson’s  method  is  compared  to  the  leapfrog  version,  in  which  alternate  steps 
are  omitted,  and  to  the  grand-leap  version  in  which  all  intermediate  steps  have 
been  omitted.  The  period,  k ,  is  assumed  even. 

The  operations  shown  in  Table  1  form  the  kernel  of  a  loop,  the  commands 
for  which  have  been  omitted.  The  details  for  a  complete  algorithm  have  been 
given  already  and  would  be  distracting  here. 

On  any  computer,  reducing  the  number  of  arithmetic  operations,  the 
traditional  goal  of  algorithm  design,  is  an  advantage.  In  the  case  of  the  leapfrog 
and  grand-leap  versions,  it  is  a  thin  advantage  but  an  advantage  nevertheless  and 
one  that  is  unexpected.  (Since  Richardson’s  method  is  a  Krylov  subspace  method, 
the  number  of  matrix-vector  multiplications  cannot  be  reduced.)  It  is  a  further 
advantage  on  supercomputers  that  do  chaining  that  there  are  more  terms  in  the 
leapfrog  expression  for  than  in  the  conventional  expression. 

Some  additional  comment  is  needed  on  how  arithmetic  operations  are 
counted.  The  number  of  arithmetic  operations  given  in  the  table  is  based  on  the 
assumption  that  there  is  no  mixed  real  and  complex  arithmetic.  Let  us  consider 
when  mixed  arithmetic  occurs.  The  Table  1  parameters  are 
{r, ,  (7, ,  L  !  L  L  )  }•  the  Hermitian  symmetric  positive  definite 


4.  TA  A.*' 
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case,  the  roots  of  Rj^  are  real,  the  Table  1  parameters  are  real,  and  there  is  no 
mixed  arithmetic.  (It  is  reasonable  to  assume  this.  A  Hermitian  positive  definite 
system  could  be  solved  with  nonreal  parameters,  however.)  If  A  is  a  general 
complex  nonsymmetric  matrix,  all  Table  1  parameters  are  general  complex 
quantities  and  since  the  matrix  is  complex,  there  is  again  no  mixed  arithmetic. 

Mixed  arithmetic  occurs  in  Richardson’s  method  if  A  is  real  and 
nonsymmetric,  for  then  the  Table  1  parameters  are  general,  complex  quantitie  . 
whereas  the  other  quantities  are  real.  If  A  is  real,  it  is  reasonable  to  assume  that 
polynomials  and  Ci^_i  are  real.  The  roots  may  then  be  grouped  in  conjugate 
pairs,  and  the  leapfrog  and  grand-leap  methods  performed  in  real  arithmetic. 
Richardson’s  method,  however,  requires  complex  arithmetic,  and  the  number  of 
arithmetic  operations  is  effectively  larger  than  shown  in  Table  1.  In  this  case,  one 
would  not  want  to  consider  Richardson’s  method,  which  was  the  motive  for  using 
the  leapfrog  method  in  [SmSaSfij. 

Now  we  come  to  an  aspect  of  these  comparisons,  namely  the  effect  on  10 
due  to  the  solution  of  large  systems,  that  is  important  to  take  into  account  but  is 
necessarily  limited  due  to  the  range  of  the  subject. 

The  limitation  made  here  is  to  consider  only  programmer-controlled  storage, 
from  among  a  list  of  topics  required  for  a  more  complete  discussion  that  includes 
architectures,  specific  application  problems,  and  implementation  details.  The 
reader  may  object  that  although  it  is  reasonable  to  dismiss  architectures,  it  is  still 


not  reasonable  to  restrict  discussion  in  quite  this  way.  For,  the  typical  user  is 
running  problems  on  a  virtual  memory  machine  and  is  beset  with  multiple 
worries  that  deserve  attention,  such  as  memory  “touches”,  or  the  loading  of 
vector  registers,  or  the  losses  due  to  flushing  a  cache.  Unfortunately,  such 
transfers  between  memory  levels  are  hardware  dependent  and  simply  cannot  be 
analyzed  within  the  scope  of  this  paper;  the  conclusions  reached  here  below  do 
not  necessarily  hold  in  these  cases.  It  should  be  noted,  however,  that  even  for 
virtual  memory  systems,  there  exist  limits  [EcclSSj  that  compel  the  use  of  explicit 
I/O  commands  similar  to  those  in  Table  1. 

One  final  comment  to  justify  the  narrow  focus  that  is  taken;  It  is 
characteristic  of  many  supercomputers  that  only  programmer-controlled 
peripheral  storage  is  available  for  large  problems  and  when  needed  is  usually 
responsible  for  languid  performance.  This  dismal  fact  often  attracts  comment. 
For  example,  Ortega  and  Voigt  observe,  “The  [programmer-controlled  10 
problem  produced  by  very  large  problems  [isj...  known  to  be  potentially 
devastating  on  high  performance  systems  ...  .”  [OrVo85j.  Programmer-controlled 
storage  includes  system  commands,  custom  utilities,  and  the  less  efficient  choice, 
depending  on  circumstances,  of  Fortran  commands.  Only  Fortran  commands  are 
given  in  Table  1. 

In  order  to  weigh  the  efi'ect  of  transfers  from  peripheral  storage,  a  large  set^of 
linear  equations  is  assumed.  This  vague  statement  will  be  sharpened  in  order  to 


arrive  at  a  rather  specific  assumption.  The  discretization  of  coupled  partial 
diflferential  equations  in  three  dimensions  yields,  in  some  applications,  leviathan 
systems  of  order  ten  million  complex  unknowns.  Such  problems  lead  to  the 
assumption  that  a  matrix  multiplication,  which  may  involve  a  preconditioning, 
absorbs  the  primary  memory  and  that  processing  after  a  matrix  multiplication 
requires  reading  in  a  vector  from  disk.  This  assumption  is  seen  in  Table  1  when, 
for  example,  in  the  leapfrog  algorithm,  r^'  must  be  read  from  disk  after 


ing  t  =  Arf* 


computing 


Under  these  conditions,  a  third  advantage  of  the  leapfrog  and  grand-leap 
algorithms  is  seen:  there  are  fewer  READ’s  and  WRITE ’s. 

In  an  actual  implementation,  it  may  well  happen,  for  example,  that  two 
matrix  READ’S  are  not  necessary  in  any  algorithm  and  that  in  the 

conventional  algorithm  need  not  be  written  on  disk.  Conditions  will  vary,  and 
the  results  in  the  table  are  only  representative.  If  the  assumption  on  matrix 
vector  multiplications  is  not  valid,  the  comparisons  would  change,  but  it  is 
plausible  that  for  large  problems  there  would  remain  an  I/O  advantage  to  the 
leapfrog  and  the  grand-leap  formulations. 


5.  Second  Order  Iterations. 


In  the  real  eigenvalue  case,  residual  polynomials  of  practical  value  are  orthogonal 
polynomials,  and  satisfy  a  three  term  recursion.  This  elegant  property  yields 
second  order  methods,  which  have  an  extra  term  in  the  expression  for  the  new 
iterate  as  compared  to  Richardson’s  method.  Richardson’s  method  is  also  called  a 
first  order  method  and  a  second  order  method  sometimes  called  Richardson’s 
second  order  method.  In  a  second  order  method,  each  new  iterate  is  optimum  in 
the  sense  that  the  residual  polynomial  satisfies  an  L2-optimality  property,  to  be 
be  discussed  in  §6.  The  Chebyshev  iteration,  employed  by  Manteuffel  :Mnt77’,  is 
an  example.  In  the  case  of  a  first  order  method,  x^^^  is  optimum  if  the  residual 
polynomial,  i?*.,  is  optimum,  but  is  not  optimum  for  1=?^ A:,  a  fact  commented 
on  previously  in  the  Notes  for  Algorithm  1.  There  is  a  cost  in  the  second  order 
method  for  optimality  at  each  step:  a  larger  number  of  arithmetic  operations  and 
a  greater  use  of  storage  compared  to  Richardson’s  method. 

The  objective  is  a  second  order  method  for  which  only  even  numbered 
iterates  are  computed  using  information  only  at  even  numbered  steps,  a  method 
hereafter  called  a  second  order  leapfrog  method.  In  the  Chebyshev  case,  an 
algorithm  will  be  given. 


• « 


5.1.  The  Second  Order  Iteration.  Some  preliminaries  are  needed.  Let 

be  the  given  initial  guess.  Define  to  be  zero  and  for  0  <  fc, 

The  second  order  iteration  requires  a  set  of  parameters  1 

that  are  given  explicitly  in  the  Chebyshev  case  in  Algorithm  3,  and  derived  in  a 
general  way  in  Algorithm  4.  Assume  these  parameters  are  given.  The  iteration 
may  now  be  stated.  Let  r^'^^  =  b  —  Ax^°^.  For  A:  >1, 

+  (5.1.1) 

and 

=  b  —  Ax^*^. 

5.2.  Second  Order  Leapfrog.  The  derivation  is  somewhat  lengthier  than  in 

the  case  of  Richardson’s  method  due  to:  the  need  to  express  in  terms  of 

information  at  step  k  —2;  and  a  complication  involving  the  residual  vector. 


First,  an  expression  for  A:  >2,  is  obtained  in  terms  of  information  at 


x{*^)=x(*  1) 


it  follows  that 


xl^'l^x**  1)  _ 


Now  use  (5.1.1)  in  the  last  equation  to  obtain 


Define 


^^{k-2)  ^^(k-l)  _^{k-2)  ^ 


Then 


x(*^)=x{*  + 


It  remains  to  express  and  in  terms  of  information  at  step  k  —  2 

The  expression  for  is  simply 


aA'^'^  ^—AAx^’^^  . 


Finally,  to  obtain  Ax^'^\ 


To  summarize,  the  formulas  to  go  from  step  /c  —  2  to  step  k  are 


Initially, 


r(*)  =  b  —  , 


x(°)  =  given , 


=  b  —  Ax^®^, 


w? 


Table  2 


Conventional  Second  Order 

Leapfrog  Second  Order 

READ 

READ 

READ 

READ 

w  =  a*.  -f  4- 

WRITE 

WRITE  w 

x(fc)  =  +  w  + 

WRITE 

WRITE  x^^^ 

READ  A 

READ  A 

v=Ax^^“^^ 

v=Ax^*^^ 

READ  b 

READ  b 

r(*-i)=b— V 

i»(^)=b — V 

READ 

READ  w 

Ax^^~^'>=-ik  +afcr**"^* 

Ax^*)  =  afc.,ir(*=*  +  7i  +  iw 

WRITE 

WRITE  Ax^*'^ 

READ 

4- 

WRITE  x(*) 

WRITE 

1 

READ  A 

READ  A 

V  =  Ax^*) 

1 

READ  b 

> 

1 

X> 

11 

' 

2  matrix  mults.  1 

i 

2  matrix  mults. 

6  vector  READ'S 

5  vector  READ’s 

2  matrix  READ’S 

2  matrix  READ’s 

4  vector  WRITE’s 

4  vector  WRITE’s 

6N  adds 

6N  adds 

4N  mults. 

4N  mults. 

j»  V  ViWVyvVffl'T'.T' 
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5.3.  Comparisons.  Under  the  same  assumptions  as  for  the  previous  set  of 
comparisons,  the  two  versions  of  the  second  order  iteration  are  compared  in  Table 
2.  There  are  fewer  advantages  of  the  leapfrog  algorithm  in  this  case  since  the 
number  of  arithmetic  operations  and  the  number  of  WRITE ’s  is  the  same.  The 

‘  advantages  are  that  there  are  fewer  READ’S  and  a  greater  number  of  terms  in  the 

sum  defining  in  the  leapfrog  version.  Note  that  variations  are  possible,  for 
example,  in  recomputing  w  in  the  leapfrog  version,  and  that  the  arrangement  of 
terms  used  here  is  not  necessarily  suitable  for  a  particular  problem  or 
architecture. 

5.4.  Algorithm  for  the  Second  Order  Leapfrog  Method  in  the 
Chebyshev  Case.  For  the  convenience  of  the  reader,  an  algorithm  is  given 
below  for  the  case  of  Chebyshev  parameters.  As  before  with  Algorithms  1  and  2, 
no  attempt  is  made  to  incorporate  I/O  statements. 

Algorithm  3.  (Second  order  leapfrog  iteration  with  Chebyshev  parameters.) 

Purpose.  Execute  the  leapfrog  form  of  the  second  order  iteration  for  the 
Chebyshev  case.  The  parameters  are  the  same  as  for  the  standard  second  order 
Chebyshev  iteration  as  used  for  example  in  the  Manteuffel  algorithm  ;Mnt78, 
Ashb85j. 

Input.  Matrix  A,  right  side  b,  and  initial  guess  also  a  pair  d  and  c  such 


that  d  is  the  center  and  d  ±  c  are  the  foci  of  a  family  of  ellipses  over  which  the 


Chebyshev  residual  polynomial  is  (nearly)  minimum  with  respect  to  the  uniform 
norm.  The  ellipse  parameters  are  assumed  known,  for  example,  as  output  from 
the  Manteuffel  algorithm  [y\shb85].  In  the  general  non-Chebyshev  case,  this 
algorithm  could  be  easily  modified  to  allow  {aj^ }  and  }  to  be  input  parameters, 
say,  from  Algorithm  4. 

Output.  The  algorithm  generates  a  set  of  optimum  iterates  converging  to  the 
solution  of  Ax  =  b  if  the  restrictions  are  satisfied. 

Restrictions.  The  restrictions  are  the  same  as  for  Algorithm  1. 

Notes.  The  and  '7^.  parameters  need  not  be  array  variables;  the  subscripts 
aid  clarity. 

1)  Set  :=  b  —  Ax^°l 

2)  Set  Oj  ;=  l/d. 

3)  Set  := 

4)  Set  := 

5)  Set  . 

6)  Set  72  =  da2  I- 

7)  Do  A:  =  2  by  2  until  either  convergence  or  a  l>mit  is  exceeded; 


•;*1 


I 

'4 


I 


6.  L2-Optimum  Parameters. 


If  either  the  L2-  or  /2~Qorin  is  used  to  define  optimum  residual  polynomials,  then 
it  turns  out  that  optimum  residual  polynomials  form  a  family  of  orthogonal 
polynomials  if  the  inner  product  (either  integral  or  sum)  is  defined  over  a  real  set. 
From  this  fact,  algorithms  follow  for  the  computation  of  the  r-parameters  for 
Richardson’s  method,  the  cr-parameters  for  the  grand-leap  method,  and  the 
parameters  for  the  second  order  method,  which  are  presented  in  this  section.  The 
assumption  that  the  inner  product  is  defined  over  a  real  set  usually  means  that 
the  eigenvalues  of  the  system  matrix  are  real.  The  Chebyshev  case  is  an 
exception  for  which  the  eigenvalues  need  not  be  real.  (Since  Chebyshev 
polynomials  form  an  orthogonal  family,  Chebyshev  residual  polynomials  are  Lo- 
optimum  as  well  as  L-^^-optimum.) 

The  algorithms' in  this  section  generalize  to  the  case  of  an  inner  product 
defined  over  a  contour  in  the  complex  plane.  (A  generalization  may  be  based  on 
[SaSm88].) 


6.1.  L2-OptimaIity.  Some  notation  is  necessary.  Let  F  be  an  interval  or  a 
union  of  intervals  on  the  real  line  (generally,  F  could  be  any  measurable  subset) 
and  let  w  be  a  positive  weight  function  on  F.  Define 


(/.  ?)u,  = 

L,  p 


rB 


■where  L  =  fw(^)d^.  The  set  T  may  be  assumed  to  be  real  by  a  linear  change  of 
r 

variables  if  necessary.  In  practice,  rather  than  the  continuous  inner  product,  one 
would  use  a  discrete  inner  product  of  the  form 

1  M 

where  m(^)  is  a  measure,  such  as  i  —  ^i_il  .  A  norm  is  defined  by 

II  /II  ^=(/,  /)t.  ■ 

An  (L^-)  optimum  residual  polynomial  of  degree  k  is  defined  to  be  that  residual 
polynomial,  ,  with  the  smallest  norm, 

II  /2fcll^<llnll^ 

where  P^.  is  any  residual  polynomial  of  degree  k.  Ideally,  T  should  contain  the  spectrum 
of  A,  and  conform  to  the  spectrum  as  closely  as  possible.  Thus  if  the  spectrum  were 
contained  in  the  union  of  two  intervals,  P  should  also  be  the  union  of,  if  possible,  the 
same  two  intervals.  How  to  find  the  interval  or  union  of  intervals  containing  the 
spectrum  is  a  difficult  problem,  and  is  not  considered  here;  the  reader  is  referred  to  the 
Manteuffel  algorithm  [Mnt78j  (which,  however,  computes  only  one  interval  containing 
the  spectrum.)  If  {/?,  }  is  a  set  of  optimum  residual  polynomials,  then  [Stie58|  they  form 
an  orthogonal  set  with  respect  to  the  modified  weight  function, 


lL 


if  and  only  if  i 


8.2.  The  Recursive  Property  of  Orthogonal  Polynomials.  The  well-kno-^vn 
three  term  recursion  for  orthogonal  polynomials  is  recalled,  a  property  that  yields  not 
only  a  second  order  iteration,  which  is  derived  here,  but,  also  in  §7.1,  an  algorithm  for 
computing,  among  other  things,  the  roots  of  the  optimum  residual  polynomial,  needed  in 
order  to  execute  Richardson’s  method. 


Define  to  be  zero,  and  let  <j>Q  be  a  nonzero  constant.  A  family,  {<^j.  :0</c},  of 
orthogonal  polynomials  satisfies  a  three  term  recursion:  for  1  <  A: , 


nk<i>k-2[i)  > 


(6.2.1) 


where  are  recursion  coefficients  given  by 

0k  {^^k-V^k-l)v 

^k  II  «?ifc_ill  I 

Ik  {i^k-\^^k-2)w 


One  coefficient  is  a  parameter  that  allows  a  normalization,  such  as,  for  example. 
II  <i>J  ^=1.  If  {<i>k:0<k}  is  a  family  of  residual  polynomials,  the  desired 
normalization  is  =  which  yields  +  1  and,  with  Rk{^)  = 


^fc(0=(“ic^+n'fc  +i)i?it-i(0-7fc-Rfc-2(0  • 


(6.2.2) 


6.3.  Second  Order  Iteration.  The  recursion  for  the  residual  polynomials  yields  an 
iteration  for  which  is  in  the  following  sense:  The  error,  e^^^=x— 

satisfies  e^^^  =i?jt(A)e^°^,  where  is  an  L2“OPtinium  residual  polynomial. 

To  derive  the  iteration,  replace  ^  with  A  in  (6.2.2)  and  multiply  on  the  right  by  r'° 
to  get[Stie58],  for  1  <  /:  and  r^~^^  defined  to  be  zero. 

Replace  r^-^^  by  b— Ax^-'^,  j  =k  ~2,k—l,k,  and  multiply  on  the  left  by  A~'  to  obtain 

xf'')  =(1  +7^)x(*-') 

Initially,  is  given  and  r^°^  =b  —  Ax^°^. 

The  iteration  is  usually  expressed  in  terms  of  the  iterant  difference,  x^^^  —  as 

in  (5.1.1). 


8.4.  A  Method  for  the  Roots  of  A  matrix  will  be  derived,  the  eigenvalues  of 

which  are  the  roots  of  C^t-r 

Recall  from  §6.1  that  an  optimum  residual  polynomial  of  degree  k  is  defined  to  be 
that  residiial  polynomial,  that  solves  the  weighted  least  squares  problem 


•  *-•  i.  • .  • 


•*  V  V 
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where  is  any  residual  polynomial  of  degree  k.  Also  if  (P,  )  is  a  set  of  optimum 
residual  polynomials,  then  they  form  an  orthogonal  set  with  respect  to  the  modified 
weight  function,  see  (6.1.1). 

The  roots  of  orthogonal  polynomials  may  be  computed  by  a  stable  algorithm  based 
on  the  fact  that  the  roots  are  the  eigenvalues  of  a  symmetric  tridiagonal  matrix,  S;. . 
The  algorithm  is  called  the  Stieltjes  algorithm  and  matrix  is  called  the  Jacobi  matrix. 
The  Stieltjes  algorithm  is  recommended  for  computing  the  optimum  Richardson's 
method  parameters,  which  are  the  reciprocals  of  the  roots  of  the  optimum  residual 
polynomial.  Matrix  may  be  modified  by  ore  element  to  obtain  a  matrix  the  nonzero 
eigenvalues  of  which  are  roots  of 

If  only  the  roots  of  and  are  desired,  it  is  not  important  ..hat  Rj..{0)  =  1.  It 

is  preferable  to  work  with  the  normalized  family 


MO 


RAO 
1!  R,  II 


6.4.1.  The  Roots  of  The  elements  of  matrix  Sj^  are  the  coefficients  of  the 

three  term  recursion  satisfied  by  {d>,  }. 


It  is  convenient  to  write  the  recursion  (6.2.1)  in  the  slightly  different  form 


Ik 


W 

I 


r*.)rK  iTH  dv.  dn 


The  procedure  begins  with  the  initial  polynomial  <f>Q.  Since  4>q  is  a  constant  such 
that  11  <I)q\\  =1,  it  follows  that 


M0  =  - 


Next,  to  compute  the  root  of  <^i,  it  follows  from 


?  )  =  5  1 100(0  +  «  12<?^l  (^)> 


that,  if  is  to  be  orthogonal  to  <^q, 


■ 


Of  course, 


Now  assume  Sfc_i  has  been  computed,  2<k.  Since  it  is  the  (A:— l)X(fc— 1)  principal 
submatrix  of  only  the  last  row  and  column  of  S^.  need  be  computed,  a  total  of  three 
nonzero  elements.  Since  the  polynomials  are  normalized,  Sj^  may  be  proved  to  be 
symmetric.  Hence  only  yt  and  are  required. 

Matrix  S^^-i  yields  the  roots  pj  ^_i,  fc-i  ^k-i-  Let 


Then 


^k-iiO 


The  elements  to  be  computed  are  and  These  are  unknowns  in  the  relations 


(with,  ooursCj  ^ 

^(l>k-2{0=^k-l,k-2^k-3{0+^k-l,k~l^k-2U)+Sk-l.k<^k-l{0 

and 

^4>k-l{0=^k,k-l^k-2i0+^kk  <l>k-liO+^k,k+l<i>kiO- 

Orthogonality  yields 

^k-l,k  —{i<i>k-2^  ^k-\)iv3 

and 

^kk  =(^<Afc-l)  <^k-l)iw 

This  completes  the  computation  of  Sj^.  An  algorithm  (Algorithm  4)  is  gi 

6.4.2.  The  Roots  of  We  have 

<Aik(0 

Since  R^(^)  =  — — -  ,  it  follows  that 

4>kW 


(6.4.1) 


(6.4.2) 


(6.4.3) 
,'en  in 


■^y> 


<;^A(e)=4(e)[«^*(o)/^o(e)!-<?i;fe(o)|ec4-,(o)- 

Equation  (6.4.1)  therefore  gives 

~^k,k  +  \^k[^)  >  (6.4.4) 

where  -Syti  .'=.5;^  ^^l0l|.(O)/(^o(<f)'  course,  <j>Q{i)  is  a  constant.)  Define  a  lower 

Hessenburg  matrix  S;j=(i,j)  by  setting  s^j:=s^j  unless  i=k,j  =  \,  in  which  case 
has  been  defined. 

The  equation 


C^( 0  =  Sjt  ^(0  + -Sit  ,A: 

now  becomes 

e^( 0 = s,  ^( e)  -  s, (0)  (ec,  _  1  ( 0 1  . 

The  roots  of  ^C^_i(i^)  are  therefore  the  eigenvalues  of 

8.5.  Leading  Coefficient  g^-i.  Preparations  are  nearly  complete  for  the 
computation  of 


x('=)=x(°^  +  Q_i(A)r(°>  . 


There  is  one  remaining  detail,  an  expression  for  the  leading  coefficient  such  that 
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7.  Algorithms. 


In  this  section,  algorithms  are  given  for  the  root  finding  algorithms  for  the  general  1-2- 
case  and  also  for  the  Chebyshev  case. 

7.1.  Algorithm  for  Normalized  Residual  Polynomials. 

Algorithm  4-  (Compute  the  recursion  coefficients  of  a  specified  orthogonal  family, 
and  the  roots  and  leading  coefficients  of  the  degree  k  orthogonal  polynomial  Ok  of  the 
family.) 

Purpose:  Generate  the  factored  form  of  successive  normalized  (real) 
orthogonal  polynomials,  <^, ,  i  =0,...,k;  and  the  residual  polynomial  recursion 
parameters,  (o;;;.!  7*}-  Additional  output  is  described  below.  If  polynomials  are 
optimum  with  respect  to  a  weight  function  w,  they  are  then  orthogonal  with  respect  to 
the  (real)  weight  function  3.nd  this  will  be  the  weight  function  used  below.  Note 

that  RkiO  =  is  3,  residual  polynomial. 

Input.  A  subprogram  must  be  provided  to  compute  an  inner  product 
[f  ,g)  =  ^j  f  where  L=fd^.  In  practice,  this  subprogram  would 

L  r  r 

M 

compute  a  discrete  inner  product  if  ,g)  = - D  f  Input  to  the 

^.=1 

subprogram  would  be  the  number,  M,  of  nodes,  the  nodes  ^,,1=1,  and  the 

weight,  u;(^, )  (an  array  or  external  function). 


Input  to  Algorithm  4  then  consists  of  input  to  the  subprogram,  the  subprogram 
itself,  and  the  degree,  k,  of  the  highest  degree  normalized  orthogonal  polynomial. 

Output.  The  algorithm  generates  (l)  a  two  dimensional  array  of  roots,  {/>_,,  },  of  p, . 
0  <  z  <  A:,  required  for  (a  modificaiion  of)  Algorithm  1  in  the  non-Chebyshev  case;  (2) 

parameter  ,  needed  for  Algorithm  2  in  the  special  case  k  =1;  (3)  the  Jacobi 

Pii 

matrix  (4)  <j>Q,  and  <^jt(0),  needed  for  Algorithm  5;  (5)  the  recursion  coefficients 
{a).,  for  the  residual  polynomials,  needed  for  (a  modification  of)  Algorithm  3;  and 
(6)  the  array  of  leading  coefficients,  z/, ,  of  (j>^,  needed  for  Algorithm  5. 

Restrictions.  Degree  k  must  satisfy  k  <M.  The  restriction  on  F  is  that  the  nodes. 
1^,},  lie  on  the  real  line,  which  holds  if  A  is  Hermitian  symmetric  positive  definite. 
However,  it  is  not  necessary  that  A  be  Hermitian  symmetric  positive  definite.  For 
example,  if  A  is  real  nonsymmetric  then  F  may  be  taken  to  be  the  major  axis  of  an 
ellipse  enclosing  the  nonzero  spectrum  and  the  inner  product  taken  to  be  the  inner 
product  defining  Chebyshev  polynomials;  see  §7.3. 

Notes.  The  reciprocals  of  the  roots  of  <j>i^  are  the  r-parameters  needed  for 
Richardson’s  method,  and  are  general  input  for  a  modification  of  Algorithm  1. 

This  algorithm  directly  computes  the  recursion  coefficients,  for  normalized 

polynomials;  see  (6.4.1).  These  are  not  the  recursion  coefficients,  (a^,,  7j(.}>  residual 
polynomials.  An  expression  for  the  a^.,  coefficients  in  terms  of  the  coefficients  is 
given  as  follows. 


4 
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4)  Set  1^]  :  = 


5)  Set  (jii(O)  :=  iyy{-Pu) 


6)  For  2  <  t  ^  k,  do: 


6.1)  Set  the  first  i  —2  elements  of  the  last  column  of  ,  column  i,  equal  to  0. 


6.2)  Formulas  (6.4.2)  and  (6.4.3)  give  the  remaining  two  elements.  Set 


•Sit'  •—  (C<^t-li<?^i-l)fu)  • 


6.3)  Compute  the  eigenvalues  of  (the  symmetric  matrix)  .  Set  the  roots,  { 
of  to  these  eigenvalues. 


6.4)  Sett',:=- 


II 


6.5)  Set<^,(0)  :=(-l)V.  /Ji,  •  •  •  Pi 


6.6)  Set  a,  := 


6.7)  Set  7,  :  = 


Si,,+i<?^.(0) 

Si,i-l<i^i-2(0) 

S.  ,,+!</>.  (0) 


Enddo. 


i  y  J 

^  -  -  >  •<>’ 
'  V  V  .  ■ 
-/'-•/nA 


0  •• 


c-w 


■>>.7 

“  V  V 


V^-V' 

^  j"  ^ 


'  ^ 


7.2.  Algorithm  for  the  Grand-Leap  Parameters. 


Algorithm  5.  (Compute  Sj^,  and  the  roots  and  leading  coefficient  of  j 

Purpose.  Compute  the  cr-parameters  and  parameter  gi^^i  needed  for  the  grand-leap 
algorithm;  these  parameters  are  the  roots  and  leading  coefficient  respectively  of 

Input.  A  matrix  (■s,y)fc+ix;S:+i>  such  as  ^  from  Algorithm  4,  nonzero  parameters 
(f>Q,  and  <^jt(0),  such  as,  also,  from  Algorithm  4,  and  period  k. 

Output.  The  algorithm  generates  the  k  ~l  roots,  {c,  },  and  leading  coefficient  g^.^^ 
of  the  “polynomial  preconditioner”  These  quantities  become  input  for  Algorithm 


Restrictions.  There  are  no  restrictions  other  than  those  imposed  on  the  input 
parameters. 


1)  Initialize  k'>2  . 


2)  Set  :=  s^j  for  1  <  i ,  j  <  /c  (sj^j  will  be  reset  in  step  3).) 


3)  Set  -sjki:=5jt  ;t+i<Ajfc(O)/0o  • 


4)  Set  S^.  :=(s,y)  . 


•5)  Compute  the  eigenvalues  of  S^.  Set  the  roots  of  C^_j  equal  to  the  nonzer 
eigenvalues  of  S^. 


7.3.  The  Chebyshev  Case.  For  this  special  case,  the  grand-leap  parameters  may  be 
determined  with  explicit  inner  products. 

7.3.1.  Chebyshev  Orthogonality.  Assume  d,c  #0,  and  0  is  not  in  the  interval 
[d  —  c ,  d  +  c].  If  d  and  c  are  real,  this  is  equivalent  to  assuming  d  >  0,  and  d  —  \  c\ 
>  0.  Let  T^{fx)  be  the  Chebyshev  polynomial  of  degree  i  defined  by  the  familiar 
recursion  To  =  l,Z'j(^)=/:/,  and  for  1< 

Let  ~  d,)/c\  be  the  shifted  and  translated  Chebyshev  polynomial.  The 

V’t(0 

Chebyshev  residual  polynomial  is  therefore  ■—  ■■ 

(0) 

orthogonality  relations  (where  c  >  0  if  real) 


.  The  family  :  0  <  2  }  satisfies  th^ 


d—c 


c 


n,i  =  j=0 


TT 

T’ 


i  =j  9^=0 


0,  i  j 


where 


7.3.2.  Recursions  for  the  Shifted  Chebyshev  Polynomials.  The  recursion  for  T, 


yields  a  recursion  for  {j/j,-  }: 


and  for  1  <  i , 


7.3.3.  Roots  of  Cj^_i  The  roots  of  are  among  the  eigenvalues  of  S^..  .Am  explicit 
expression  for  Sj.  requires  Sj.  and  d»fc(0).  To  determine  these  quantities,  the  three  term 
recursion  for  the  normalized  polynomials  is  needed.  The  normalized  polynomials  are: 


The  recursion  is 


+  <^i(0  > 


‘Ao(0  +  '^ci>i(0  +  y  <^2(0 


a 


and  for  2  <  t , 


7. 3. 3.1.  Matrix 


To  modify  Sf. 


From  the  recursion  for  [d), },  it  follows  that 


to  obtain  S;^,  d»*;(0)  is  needed.  For  1  <k, 


Mo)  = 


V^ifc(O) 

11  V'fc  II 


coshA:  cosh  ^ 
V^7r/2 


which  follows  from  Ti^{(j,)  =  cosh/tcosh 


51 


tA.(0  =  2^— 

c 


The  leading  coefficient  of  ipf.  is  therefore  —  —  .  This  combined  with  the  expression 


for  rpf.  (0)  gives 


dk-l  n 


coshA:  cosh  M - 


7.3.5.  Algorithm  for  the  Grand-Leap  Parameters  in  the  Chebyshev  Case. 

Algorithm  6.  (Compute  the  parameters  for  the  Grand-Leap  Algorithm  in  the 
Chebyshev  Case.) 


Purpose.  Compute  Vq,  and  cr^,  in  the  Chebyshev  case  as  required  for 

Algorithm  2. 


Input.  Ellipse  parameters  d  and  c  and  period  k. 


Output.  The  A:  —  1  roots,  {o, },  and  leading  coefficient  gh-i  of  the  polynomial 
preconditioner 


Restnctwn.-i.  If  the  grand-leap  algorithm  is  to  converge  then  the  ellipse  parameters  musi, 
satisfy  the  same  restrictions  as  in  Algorithm  1. 


!^i!S 


i 


aS 


Jll 


Notes.  The  algorithm  uses  a  matrix  S^.  that  is  not  defined  if  c  =0.  In  this  case,  the  set 
of  Chebyshev  residual  polynomials  reduces  to  the  family  {R^  =  {<;  —  d)^ /d^ :  0  <  k  . 
which  is  not  an  orthogonal  family  for  any  weight  function.  The  case  c  =  0  is  often  used 
in  the  Manteuffel  algorithm  in  order  to  compute  improved  ellipse  parameters  adaptively. 
If  one  believed  that  c  =  0  then  Richardson’s  method  would  converge  in  a  single  step  if 
the  matrix  were  normal,  in  which  case  no  need  exists  for  the  grand-leap  formulation.  If 
one  were  computing  ellipse  parameters  adaptively,  then  the  parameter  computation 
technique  reduces  to  a  sequence  of  matrix  vector  multiplications,  and  again  the  grand- 
leap  formulation  is  not  desired.  For  these  reasons,  if  c  is  small  relative  to  d  the 
algorithm  halts.  The  halting  criterion  is  a  comparison  of  1  c  /d  |  ^  to  the  machine  epsilon, 
denoted  in  the  algorithm  by  “mach  eps”  and  defined  to  be  the  largest  machine  number. 
£,  such  that  the  floating  point  sum  1  +  e  equals  the  machine  number  1. 

As  a  final  note,  there  does  exist  an  analog  of  Sj^  in  the  degenerate  case  (c  =  0),  and 
the  algorithm  control  could  branch  to  the  computation  of  the  eigenvalues  of  the  analog 
of  S*,,  but  the  lack  of  a  practical  need  obviates  this  version. 


1)  Set  Tj  ;= 


—c  IT  1 2 +d 


2)  If  A:  =  1  or  I  c  /d  I  <  Vmach  eps  then  return. 


3)  Set 


<i>,(0)  :  = 


coshkcosh  ^ 


c 


MO) 


4)  Set  the  roots  {  CTj,  crj^_j}  of  equal  to  the  nonzero  eigenvalues  of  matrix 

(7.3.1). 


5)  Set  the  leading  coefficient  g^^j^'of  C^_i  equal  to 


9k-i  — 


coshArccsh  ^ 


£ 

c 


8.  Summary. 

The  leapfrog  and  grand-leap  variants  of  Richardson’s  method  and  a  general  second 
order  method  have  been  described.  A  comparison  among  the  methods  and  variants 
shows  that  there  are  advantages  either  to  omitting  every  other  iterant  or  to  omitting  all 
iterants  (except  the  last). 

The  leapfrog  and  grand-leap  variants  require  sets  of  parameters  that  may  be 
computed  from  the  eigenvalues  of  a  matrix.  In  the  leapfrog  case,  the  matrix  is  the  same 
as  that  which  expresses  the  roots  of  a  member  of  a  family  of  orthogonal  polynomials  as 
the  eigenvalues  of  a  symmetric  tridiagonal  matrix.  This  matrix  may  be  modified  slightly 
to  yield  the  parameters  needed  for  the  grand-leap  variant. 


^  V  V  ■.  --  w 


Algorithms  for  the  leapfrog  and  grand-leap  methods  are  given  in  the  Chebyshev 
case.  In  the  Chebyshev  case,  explicit  values  for  the  elements  of  the  tridiagonal  matrix  are 
well  known  and  need  not  be  computed. 
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